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method  for  generating  order  of  magnitude  estimates  for  the  t-step  transition  probabilities,  for  any 
t.  -We  then  notice  that  algorithms  of  the  simulated  annealing  type  may  be  represented  by  a  Markov 
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The  main  objective  of  this  paper  is  the  characterization  of  the  cooling  schedules  under  which 
a  simulated  annealing  algorithm  converges  to  a  set  of  desired  states,  such  as  the  set  where  some 
cost  function  is  minimized.  In  particular,  thus  generalizing  the  results  of  llajek  [9].  The  method 
we  follow  is  based  on  the  observation  that  in  simulated  annealing  algorithms  the  “temperature” 
remains  approximately  constant  for  sullicienlly  long  times.  For  this  reason,  we  may  exploit  bounds 
and  estimates  which  are  valid  for  singularly  perturbed  (approximately)  stationary  Markov  chains 
and  obtain  interesting  conclusions  for  the  simulated  annealing  algorithm.  In  the  course  of  developing 
our  result  on  simulated  annealing  we  derive  certain  results  on  approximately  stationary  singularly 
perturbed  Markov  chains  which  seem  to  be  of  independent  interest. 

The  structure  of  the  paper  is  the  following.  In  Section  2  we  assume  that  we  arc  dealing  with 
a  Markov  chain  in  which  each  of  the  one-  step  transition  probabilities  is  roughly  proportional  to  a 
certain  power  of  £,  where  e  is  a  small  parameter.  We  then  present  an  algorithm,  consisting  of  the 
solution  of  certain  shortest  path  problems  and  some  graph  theoretic  manipulations,  which  provides 
estimates  for  the  transition  probabilities  of  the  Markov  chain  for  any  time  between  0  and  1/e.  Then, 
in  Section  3,  we  indicate  how  the  procedure  of  Section  2  may  be  applied  recursively  to  produce 
similar  estimates  on  the  transition  probabilities  for  all  times.  In  Section  4  we  use  the  results  of 
Section  3  to  characterize  the  convergence  of  the  simulated  annealing  algorithm. 


In  this  Section  we  derive  order  of  magnitude  estimates  on  the  transition  probabilities  of  a  non- 
stationary  Markov  chain.  Our  results  arc  based  on  the  assumption  that  such  order  or  magnitude 
information  is  available  on  the  one- step  transition  probabilities  of  the  Markov  chain. 

We  start  with  some  notation.  We  use  M  and  M o  to  denote  the  positive  and  the  nonnegative 
integers,  respectively.  We  also  let  U  denote  the  set  of  functions  /:(0,  oo)  i-»  (0,  oo)  such  that  for 
every  n  £  Mo  there  exists  some  cn  >  0  such  that  /(e)  <  cnc",  Ve  >  0.  Notice  that  U  has  the 
property  that  /(e)/en  £  U,  V/  £  U,  Vn  £  M.  Also  notice  that  c1/'  £  U,  for  any  c  £  (0, 1). 

We  consider  a  (generally  non-stationary)  finite  state,  discrete  time  Markov  chain  X  =  {x(t):t>  0} 
with  state  space  {1  For  any  t>0  we  let  qij(t)  =  ,P(x(t  +  1)  =  j  \  x(t)  =  »)  and  Pij(t)  — 

P(x(t)  =  j  |  x(0)  =  *).  We  assume  that  some  structural  information  is  available  on  this  Markov 
chain.  More  precisely,  let  there  be  given  a  collection  A  —  {«<>:  1  <  *,j  <  N}  of  elements  of 
M>U{oo}.  Let  /  €  U  and  let  C\,  Cz  be  positive  constants.  We  assume  that  for  some  t  >  Owe  have 


c <  quit)  <  C2ta'>, 


Vt>  0,  if  a,y  <  oo, 


(2.1) 


0  <  Quit)  <  /(c),  V<>0,  ifa„  =  oo.  (2.2) 

Wc  cull  A  the  structure  of  the  Markov  chain  X.  We  make  the  following  assumption  on  A: 

<*«*  <  atij+ncjk,  Vi,j,k.  (2.3) 

Wc  shall  discuss  later  how  this  assumption  may  be  removed.  For  the  rest  of  this  section  we  assume 
that  A,  Ci,(?2  /  are  fixed  and  wc  denote  by  Mc(A,Ci,Ci,f)  the  set  of  Markov  chains  for  which 
(2.1)  and  (2.2)  hold.  (Occasionally  wc  use  the  shorter  notation  Mtl  provided  that  no  confusion  may 
arise.) 

Wc  classify  the  states  in  the  state  space  by  considering  a  Markov  chain  in  which  only  those 
transitions  from  i  to  j  with  at}-  =  0  arc  allowed.  In  particular,  a  state  t  is  called  transient  if  there 
exists  some  state  j  such  that  =  0  and  a}i  >  0.  Otherwise  *  is  called  recurrent.  In  view  of 
assumption  (2.3),  this  is  equivalent  to  the  conventional  definition.  Let  TR,  R  denote  the  sets  of 
transient  and  recurrent  states,  respectively.  For  any  »  £  R,  we  let  Ri  =  {j:  a,3  =  0}.  Wc  then  have 
j  £  Ri  if  and  only  if  j  £  It  and  i  £  Rj\  we  thus  obtain  the  usual  partition  of  the  set  of  recurrent 
states  into  ergodic  classes.  Also,  notice  that,  for  any  t  £  TR,  there  exists  some  j  £  R  such  that 
<*«>  —  0. 

Our  first  result  provides  order  of  magnitude  estimates  on  the  probability  that  a  recurrent 
state  j  is  the  first  state  to  be  visited,  starting  from  a  transient  state  ».  We  use  the  notation  T  — 
min{t>  0:  x(t)  €  R}-  Wc  also  use  the  convention  that  e°°  =  0. 

Proposition  2.1:  There  exist  F  >  0  and  <j  £  U  such  that  for  any  e  >  0,  X  £  Mt,  i  £  TR,  j  £  R  we 
have 

Clta'>  <  P(x(T)  =  j  |  x(0)  =  i)  <  Ftai>  +  ?(e).  (2.4) 

Proof;  Let  us  fix  some  j  £  R  Wc  define,  for  a  £  -V0U{oo},  Sa  =  {i  £  TR:  =  a}  and  Qa  = 

{*’  £  TR:atj>a}.  We  then  define  p„,f  =  sup^6W>  max,eqn  P(x(Ti)  =  j  |  x(0)  =  »).  Wc  first  prove, 
by  induction  on  a,  that  for  any  a  <  oo  there  exists  some  Fa  >  0  such  that  pQ,(  <  Fata,  Ve  >  0. 
This  is  clearly  true  for  a  —  0.  Suppose  it  is  true  for  all  a  less  than  some  positive  integer  fi.  Let 
t  £  Qd  and  X  £  Mf.  Notice  that  for  any  state  k  we  have  a,*  +  a*,  > a, -,•>/?.  Using  (2.1)  and  the 
induction  hypothesis  we  obtain 

3-t 

p(x(T)  =  j|x(o)  =  i)  <  £  y,  n*{'n=n*(i)  =  i')p(x(i)  =  k\x(o)=i)  + 

Q=0  t€S, 
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+  i )  €  <?/»  I  *(0)  =  *') '"»*  I’(4rn  =  y  I  *(l )  =  /)  +  l‘(x{  1)  =  j  I  x(0)  =  i)  < 

i£<ia 

0-i 

E  E  r°skiC*taik  +  (i  -  c,)P,It  +  c ^  < 

o=0  *€5. 

[N  m:xx{Fa}Ci  +  C2]tp  +  (1  - 

a<0 

Taking  tho  suprcinuin  of  the  left  hand  side  over  all  i  €  Qp  and  all  X  €  -Mt,  we  obtain,  for  some 
constant  F, 

P0,€  <  Fi0  +  (1  -  Ci)pp<t 

from  which  it  follows  that  the  induction  hypothesis  is  also  true  for  0  +  1. 

Finally,  we  assume  that  *  €  £<».  Then, 

F(x{T)  =  j  |  x(0)  =  «)  < 


P(*0)  e  TR,  x(l)  <£  Soo  I  x(0)  =  0  +  P(x(l)  =  j  I  x(0)  =  ,)  +  P(x(l)  €  |  x(0)  =  i)Po o,«  < 

N /(e)  +  (1  -  COPoo,*. 

Thus,  pXtl  <  (N /Ci)f(t),  Ve  >  0.  This  completes  the  proof  of  the  second  inequality  in  (2.4).  The 
first  inequality  is  a  trivial  consequence  of  (2.1).  • 

Let  us  mention  another  method  for  proving  Proposition  2.1.  We  could  first  prove  it  for 
stationary  Markov  chains  in  Mt,  because  in  this  ease  there  arc  explicit  formulae  for  the  absorption 
probabilities.  (Such  is  a  result  is  obtained  in  [12].)  Then,  we  notice  that  pa<t  is  bounded  above 
by  the  absorption  probabilities  which  would  result  if  an  adversary  was  allowed  to  choose  qij(t)  at 
each  time  t  after  observing  the  current  state,  subject  to  the  constraints  (2.1)  and  (2.2).  It  follows 
from  standard  results  in  Markovian  decision  theory  that  the  optimal  policy  for  the  adversary  is  a 
stationary  one  and  therefore  the  bounds  obtained  for  stationary  Markov  chains  also  apply  to  the 
nonstationary  ones.  Unfortunately,  this  method  docs  not  seem  to  work  Tor  our  subsequent  results 
because  they  correspond  to  a  maximization  over  a  finite  horizon  for  which  stationary  policies  are 
not  in  general  optimal. 

Let  us  also  point  out  that  Proposition  2.1  is  false  if  the  assumption  (2.2)  is  removed. 

The  main  result  of  this  section  is  based  on  the  following  algorithm  which  provides  important 
structural  information  on  the  long  run  behavior  of  Markov  chains  in  M*. 

Algorithm:  (Input:  A  =  {nry:  1  <  i,j  <  N}  and  R\  Output:  V  =  {V(i,j):  1  <  i,j  <  iV }) 


1.  Let  Cij  =  a,y  —  l,  if  *  £  It,  j  £  R,  j  £  It,  and  ctJ-  =  a,j,  otherwise.  (Notice  that  cx]  >  0  always 
holds.) 

2.  Solve  the  shortest  path  problem  from  any  origin  i  £  /?  to  any  destination  j'  £  /{,  with  respect  to 
the  link  lengths  c,-;  and  subject  to  the  constraint  that  any  intermediate  state  on  a  path  must  be  an 
element  of  It.  For  example,  the  Bellman  algorithm  may  be  used:  Vo(*\  j)  =  0,  if  i  =  j;  Vq (i,j)  =  oo, 
if  j  and 

Vn+i{i,j)  ~  min{Vn(i,fc)  +  e*y}.  (2.5) 

k€  H 

Let  V(i,j)  be  the  length  of  the  shortest  path  (which  is  obtained  after  at  most  N  stages  of  the 
Bellman  algorithm  suggested  above). 

3.  If  i  £  R,  j  £  TR,  let 

V(i,j)  =  m\n{V{i,k)  +  ckj}  =  min{7(i,  k)  +  a*>}.  (2.8) 

4.  If  j  £  TR,  let 

V(i,j)  =  min{c,*  +  V{k,j)}  =  minlor,*  +  V(k,j)}.  (2.7) 

Notice  that  the  output  K(t,  j)  of  the  above  algorithm  may  be  interpreted  as  the  length  of  the 
shortest  path  from  t  to  j  subject  to  the  constraint  that  all  states  on  the  path  belong  to  R,  except 
possibly  for  the  first  and  the  last  one.  We  continue  with  a  few  elementary  observations  on  this 
algorithm: 

Proposition  2.2:  (i)  V^(t,j)>  0,  Vi,  j. 

(ii)  1,  Vi,  Vj  £  TR. 

O'1*)  v(i,j)  <  V(i,  k)  +  V(k,j),  Vi,  j,  k. 

(iv)  If  j  £  R.  and  j'  £  R},  then  Vn(i,j)  —  Vn(i,j'),  Vi,  n.  Also,  If  i  £  R  and  i'  £  Rx,  then  Vn(i,j)  = 
Vj,  n. 

Proof:  Part  (i)  follows  from  the  shortest  path  interpretation  and  the  nonnegativity  of  the  c(J 's.  Part 
(ii)  follows  from  (2.G)  and  the  fact  that  Qr*;>  1,  whenever  k  £  R  and  j  £  TR.  Part  (iii)  is  clearly 
true  for  k  £  R,  due  to  the  shortest  path  interpretation.  So,  assume  that  k  £  TR.  Let  us  take 
shortest  paths  from  i  to  k  (of  length  V(t,j))  and  from  k  to  j  (of  length  V(k,j))  and  concatenate 
them.  This  produces  a  path  from  i  to  j,  of  length  V[i,  k)+V(k,j),  such  that  all  intermediate  states, 
except  from  k,  belong  to  R.  If  fci  and  £2  arc  the  predecessor  and  the  successor,  respectively,  of  k  in 
this  path,  we  use  (2.3)  to  conclude  that  e*,*  +  Ckk2^.ck,kt  which  shows  that  k  may  be  eliminated 
from  this  path,  to  produce  a  path  from  i  to  j,  with  all  intermediate  elements  belonging  to  R,  and 
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with  loni'th  less  or  equal  than  V{i,k)  +-  V (k,  j),  as  desired.  Finally,  for  part  (iv),  we  use  assumption 
(2.3)  to  sec  that  Cj3  =  a *3  =  0,  whenever  *  £  /i  and  j  £  /£<.  The  result  follows  from  the  shortest 
path  interpretation.  • 

We  notice  that,  as  a  consequence  of  part  (iv)  of  the  proposition,  the  algorithm  need  not  be 
carried  out  for  all  states.  It  su dices  to  consider  transient  states  and  one  representative  from  each 
crgodic  class 

The  following  proposition  establishes  the  relevance  of  the  V(i,j)’a  to  the  Markov  chains  under 
study. 

Proposition  2.3:  For  any  C3  >  0,  there  exist  positive  constants  Gj,  Gjj,  G 3,  G*,  with  G*  <  1,  and 
some  g  £  U  such  that,  for  any  e  >  0,  for  any  Markov  chain  in  and  any  states  i,  j  we  have 

G 1  ( c ( t  -  N))"tv <  p,3(t)  <  Caev<^>  +  x.GjGle0-  +  g(e),  Vt  €  [N,  C3/e],  (2.8) 

where  —  0,  if  i  €  R,  and  y,-  =  1,  otherwise.  (The  upper  bound  in  (2.8)  is  also  true  for  t  £  (l,  Af].) 
In  particular,  there  exist  G 1  >  0,  Ga  >  0,  g  E  U  such  that 

Gi«VW)  <  Pu(j)  <  GtevM  +  g(e).  (2.9) 

Proof:  Notice  that  for  any  i  £  R,  j  £  Ri  we  have  qij[t)  <  Gac,  Vt.  It  follows  that  P(x(t  +  1)  g 
Rj  |  x(t)  £  Rj)  <  NC-iC,  from  which  we  easily  conclude  that  there  exists  some  >  0  such  that 

P(x(t)  €  Ri  |  x(s)  £  Ri)  >  Fu  0  <s<t<  C3/e,  Ve  >  0,  VX  £  Mt,  Vi  £  R.  (2.10) 

We  now  start  the  proof  of  the  lower  bound  in  (2.8).  If  V{i,j)  =  00,  there  is  nothing  to  prove, 
so  we  will  be  assuming  that  V[i,  j)  <  00.  We  first  assume  that  »  £  R  and  j  £  R.  Then,  there  exists 
a  sequence  *  =  ij,  ta,  ...,«„  =  j  of  elements  of  /?,  (with  n  <  N)  such  that  c,k,,+I  =  /(i,  j) 

and  such  that  a,kit+1  >  1,  Vfc.  Let  k  £  X  and  suppose  that  there  exists  some  f\  >  0  such  that,  for 
all  «  >  0  and  for  all  X  £  Mt, 

P(x(t)  £  Rik  |  x(0)  =  i)  >  Fk{i(t  -k+  l))*-‘eS*-l  •*•*•♦* ,  Vt  £  [lb  -  1,  Cj/t).  (2.11) 
We  then  have 

P(x(t)£/^+,|x(0)  =  i)> 

t-1 

£  W)  e  I  x(a  +  0  €  +  l )  /’(*(<  +  1)  e  I  *(«)  e  «n)/’(x(a)  6  Rit  |  x(0)  =  i)  > 


(2.12) 


k'k  +  l  t\ 


> 


*Zi 

e--  c  £>o>  -  k  + 1))*-1. 


*=k 


Clearly,  there  exists  a  constant  F'k  such  that 

£(a  -k  +  I)*-1  >  F'k[t  -  fc)\  Vt. 

t—k 

Inequality  (2.10)  shows  that  (2.11)  holds  for  k  =  1.  We  have  thus  proved  by  induction  on  k  that 
(2. 1 1 )  holds  for  all  k.  Notice  that 


/’(i(<)  =  j  J  x(0)  =  t)  >  I‘{x(t)  =  j  |  x{t  -1)6  Rj)  P(x(t  -1)6  Rj  |  x(0)  =  »)  > 

CiP{x(t  —  1)  G  Rj  |x(0)  =  »), 

which  completes  the  proof  of  the  left  hand  side  of  (2.8),  for  the  case  where  «  E  R  and  j  E  R- 

Suppose  now  that  i  E  R,  j  €  TR  and  let  k  E  R  be  such  that  V{i,j)  =  (*,  Jfc)  +  a*y.  If 

Okj  =  00,  then  V(i,j)  =  00  and  there  is  nothing  to  prove.  So,  assume  that  <  00.  Then, 

P{x(t)  —  j  1 1(0)  =  «)  >  P{x(t)  =  j  |  z(t  -  1)  =  k)P(x(t  —  1)  =  A:  |  z(0)  =  »)  > 

C  itak‘  P(x(t  —  1)  =  fc|x(0)  =  i). 

Given  that  we  have  already  proved  the  lower  bound  for  p**(t),  the  desired  result  Tor  p,y(t)  follows. 

Finally,  let  i  E  TR.  The  result  follows  similarly  by  choosing  fc  E  R  so  that  a;*  +  V[k,j)  = 
V(i,  j)  and  using  the  inequality 

R{*(t)  =  3  I  x(0)  =  »)  >  P(x(  1)  =  k  1 1(0)  =  t)  P(x(t)  =  j  |*(1)  =  k). 


We  now  turn  to  the  proof  of  the  upper  bound  in  (2.8).  Let  :  E  R  be  fixed.  We  define  Ea  = 
{j  E  R'-  ^(»,j)  =  a},  Ta  =  {j  E  TR:  V(i,j)  =  n},  R^a  =  Up <,aEp-  We  also  define  similarly 
R>a<  T<, ,,  T>a.  We  will  prove  by  induction  that  for  any  a  <  00  the  following  statements  hold: 
[S /'.’„):  There  exists  some  Ga  such  that  Vt  >  0,  VX  E  -Mtl  Vj  E  E>a  and  Vt  <  C^/t  we  have 
Pii(t)  <  Ga(°. 

(STn)  There  exists  some  G'a  such  that  Ve  >  0,  VX  E  Me,  Yj  E  and  Vt  <  Cj/e  we  have 
w,(0  <  C'.t0. 
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Statement  S li, 0  is  trivially  true,  with  Ga  —  1.  We  now  prove  ST\.  (Notice  that  7*^  i  =TR.) 
Now, 

/'(x(t  +  1)6  Til  1 x(0)  =  i)  < 

r(x(t  +  i)g  Tit  | *(<)  e  tv*)  P[x(t)  e  tr  |  x(o)  = »)  +  /5(x(t  + 1)  g  tii  |  x(t)  g  «)  < 

(1  -  c1)/>(x(i)  G  77t  I  x(0)  =  *)  +  NC2£.  (2.13) 

Since  i  e  It,  r(x{ 0)  €  TR  j  x(0)  =  «)  =  0  and  (2.13)  implies  P(x(i)  G  TR  |  x(0)  =  t)  <  ( NC2e)/Cu 
Vt>0,  which  proves  STi. 

Now  let  a  be  some  positive  integer  and  assume  that  statements  S Ep- 1  and  STp  arc  true,  for 
all  /?<a.  We  will  prove  that  SEa  and  STa  + 1  arc  also  true.  We  first  need  the  following  Lemma. 
Lemma  2.1:  If  j  G  J  =  i?^(a-i)U7,£a  and  k  £  K  =  /?gQU7’>(a_H),  then  V[t,j)  +  ay*>a  +  1. 
Proof:  (i)  If  j  G  /^(a-i),  k  G  £>a.  then  V(i,i)  +  ay*  =  V(»,j)  +  cy*  +  1  >  V(*,fc)  +  l>a  +  1. 

(ii)  If  j  G  £^(a-i).  fc  €  T’sia+i)*  tLcn  +  “>*  =  ^(*.i)  +  Cjk>V{i,k)>a  +  1. 

(iii)  If  j  G  7’^a»  &  G  let  l  €  R  be  such  that  V(i,l)  +  a/y  =  Suppose  that  l  G  7?*. 

Then,  V[i,l)  =  V(i,k)>a  and  V(i,j)  =  V(t,  l)  +  a/y>a  +  1,  which  contradicts  the  assumption 
j  G  T<a  We  thus  assume  that  l  g  7i*.  Then,  V(i,j)  +  ay*  =  V(i, /)  +  a/y  +  ay*>  V( t, /)  +  a/*  = 
V(i,l)  +  clk  +  l>V(*,Jt)+  l>a+  1. 

(iv)  If  j  G  T^„,  fc  G  7’>(a  + ,),  let  1  G  R  be  such  that  V(i,l)  +  a/y  =  Then,  V(i,j)  +  ay*  = 

V(i,  l)  +  a,y  +  a3*  >  V(i ,  /)  +  a,*  >  K(«,  *)>a  +  1.  • 

We  now  use  the  induction  hypothesis  and  Lemma  2.1  to  obtain 

P{x{t  +  1)  G  K  |  x(t)  G  J)  P{x(t)  G  J  |  x(0)  =  *)  < 

P(x{t  +  1)  =  fc  |  x(t)  -  j)P{x[t)  —  i  |  x(o)  =  0  < 

Ctta’"Gtv{i'j)  <  {N2C2G)ta+l, 

k£K,j€J 

where  G  =  max{C^_l, G’^;  0  <  a}.  It  follows  that 

P(x(t)  G  K  |  x(0)  =  t)  <  [N2C2G)ta+lCzlt1  Vi  G  [1,  Cj/t), 
which  proves  SEa.  Finally, 

P(x(t  f  1)  G  r>Q+1  |x(0)  =  *)  <  (1  -  C7,)r(x(i)  G  T>a+l  |x(0)  =  «)  +  NGainC2t  +  N2C2Gta+i 


which  shows  that 


r(x(t)  e  r2„+1 1 1(0)  = »)  <  u/WNG.Ct  +  N2c.,cya+l,  vt  g  [i,c,/e]. 

This  proves  5Ta  and  completes  the  induction. 

We  have  thus  completed  the  proof  of  the  upper  bound  in  (2.9)  for  the  ease  where  i  £  R  and 
V[i,  j)  <  oo.  The  proof  for  the  ease  i  £  R  and  V(i,j)  =  oo  is  very  simple  and  is  omitted.  We  now 
assume  that  i  £TR.  Let  T  be  the  random  time  of  Proposition  2.1.  Then,  for  some  F  >  0,  G  >  0, 
0i  o' 1 9"  €  U,  we  have 

P<j(*)  < 

P{T  >  t)  +  ^2  /*(*(0  =  3  I  X(T )  —  k,T  <  t)  P(x(T)  =  Jfc,  T  <  1 1  z(0)  =  *)  P(T  <  1 1  z(0)  =  t)  < 

fcgH 

(1  _  +  ^  +  +  ^(e)]  < 

*6« 

(1  -  +  NGFeV{i^  +  g"(c),  Vl  G  [1,  Cj/e]. 

This  completes  the  proof  of  the  proposition.  • 

Notice  that  the  upper  and  lower  hounds  arc  tight,  within  a  multiplicative  constant  independent 
of  e,  when  t  =  1/e.  For  smaller  times  the  bounds  arc  much  further  apart.  It  is  not  hard  to  close 
this  gap,  although  we  do  not  need  to  do  this  for  our  purposes.  In  particular,  the  exponent  in  the 
term  (((t  -  N  +  l))N  in  the  lower  bound  may  be  reduced.  This  may  be  accomplished  with  a  minor 
modification  of  the  induction  hypothesis  in  the  proof  of  the  lower  bound.  The  upper  bound  may  be 
also  improved  in  a  similar  manner. 

The  remainder  of  this  section  is  devoted  to  showing  that  the  assumption  (2.3)  on  the  structure 
of  the  Markov  chains  up'W  study  is  not  an  essential  restriction.  Roughly  speaking,  we  will  establish 
that  our  results  arc  applicable  to  any  Markov  chain  which  is  aperiodic  in  the  fastest  time  scale  in  a 
strong  sense  to  be  defined  below. 

Let  there  be  given  a  set  of  nonnegative  integers  A  =  {a,y:  1  <  i,j  <  N},  not  necessarily 
satisfying  (2.3).  Let  us  define  /9tJ  as  the  length  of  the  shortest  path  from  i  to  j,  with  respect  to  the 
link  lengths  a,y.  (we  require  a  “path”  to  have  at  least  one  hop;  thus,  /?,, ^  0,  in  general.)  We  make 
the  following  assumption  on  A : 

Assumption  A1V  There  exists  some  positive  integer  M  with  the  following  property:  for  any  m>M 

and  for  any  t  such  that  0„  =  0,  there  exists  a  path  (t|,  . . *m)  such  that  »’i  =  1^  =  i  and  which 

has  zero  length  (with  respect  to  the  link  lengths  a, A 


For  any  Markov  chain  wlio.se  structure  is  described  by  A,  meaning  that  t.lie  estimates  (2.1),  (2.2) 
arc  valid,  assumption  A  I*  requires  the  following:  if  we  substitute  0  for  (,  and  decompose  the  resulting 
Markov  chain  into  ergodic  classes,  in  the  usual  manner,  then  each  or  the  non  communicating  classes 
of  recurrent  states  is  aperiodic.  However,  this  requirement  is  not  sufficient  for  Assumption  AP  to 
hold. 

It  can  be  shown  that  if  A  satisfies  assumption  AP,  then  M  can  be  chosen  to  be  smaller  than 
/V2.  (This  is  related  to  the  fact  that  the  “index  of  priinitivity”  of  any  primitive  nonnegative  matrix 
is  bounded  above  by  N2  —  2 N  +  2;  for  more  details,  sec  Chapter  2  of  [13].) 

Now  suppose  that  A  satisfies  assumption  AP  and  let  M  be  as  prescribed  in  that  assumption. 
Given  some  positive  constants  Ci,  C 2,  some  /  £  U  and  some  e  >  0,  consider  the  set  Mt(A,  C\,  C 2,  /) 
Let  Q  be  some  positive  integer.  For  any  X  G  M, (A,  ct ,  c2,  /),  let  us  define  X(^  to  be  the  discrete  time 
Markov  chain  obtained  by  sampling  X  every  Q  time  units.  Finally,  let  3  =  {0ij~  1  <  i,j  <  N}. 
Due  to  their  definition  as  shortest  path  lengths,  the  coefficients  0ij  satisfy  (2.3).  The  following 
Proposition  establishes  that  the  coefficients  describe  the  structure  of  the  sampled  Markov  chain 
at  least  when  the  sampling  period  Q  is  chosen  large  enough. 

Proposition  2.5:  Suppose  that  A  satisfies  Assumption  AP.  Then,  there  exists  some  Q  >  0,  some  posi¬ 
tive  C\,  C'. j  and  some  f'GU  such  that  €  Mc(A,Ci,C«,  /)}  is  a  subset  of  Mt(3,  C\,  C2,  /'). 

Proof:  Let  B  =  max{/31J-:  /3,-j  <  00}  and  Q  =  ma.x{N(D  +  2),  M  +  2N },  where  M  is  the  constant 
of  Assumption  AP.  Let  us  fix  some  »,  j.  Consider  an  arbitrary  sequence  of  Q  transitions  from  i  to 
j.  The  probability  that  this  sequence  occurs  is  bounded  above  by  .  There  arc  less  less  than 

such  sequences.  Hence,  f’(xQ(l)  =  j|if^(0)  =  t)  <  i0i‘ ,  which  shows  that  X ^  satisfies 

the  right  hand  side  inequality  of  (2.1),  with  C2  replaced  by  C 2  =  (NC-i)®  and  with  a,-7  replaced 
by  0iy 

In  order  to  show  that  the  left  hand  side  inequality  in  (2.1)  also  holds  for  the  Markov  chain  X® , 
it  is  sufficient  to  produce  a  sequence  of  exactly  Q  transitions  leading  from  i  to  j  for  which  the  total 
length  (w.r.t.  a,;)  is  less  or  equal  than  0ij.  This  is  vacuously  true  if  /?,;  =  00;  we  thus  assume 
that  0t]  <  00.  Wc  proceed  as  follows:  find  some  path  from  i  to  j  of  length  0{3 .  Then  find  some  k 
which  appears  on  this  path  at  least  (B  -I-  2)  times.  (Such  a  k  exists  because  Q>  N(B  +  2).)  Then, 
B>0ij>(B  4-  1  )0kk<  which  shows  that  0kk  =  0.  Now,  find  a  path  from  i  to  k  with  length  equal 
to  /?,*,  as  well  as  a  path  from  k  to  j  with  length  0k j-  Let  nj,  n2  be  the  number  of  hops  in  these 
paths,  respectively.  Without  lo.ss  of  generality,  we  may  assume  that  n\  <  N  and  n2  <  N.  Then, 
find  a  path  from  k  to  k  (i.c.  a  cycle)  which  has  jeio  length  and  exactly  Q  —  n j  —  n2  hops.  (This  is 


10 


possible  due  to  Assumption  AP  and  because  Q  -  ri|  -  n2>Q  -  2/V> M).  Finally,  merge  the  three 
paths  to  obtain  a  path  from  i  to  j  with  length  /J,-y  and  with  exactly  Q  hops.  • 

Using  the  above  result,  Proposition  2.3  becomes  applicable  to  an  appropriately  sampled  version 
of  a  given  Markov  chain,  assuming  condition  AP.  We  notice  that  Proposition  2.3  will  provide  u a 
with  estimates  of  the  transition  probabilities  only  for  those  times  which  are  integer  multiples  of  Q. 
However,  it  is  easy  to  show  that  the  same  estimates  arc  also  valid  for  intermediate  times  as  well. 

Using  a  more  complicated  reduction  procedure  it  is  possible  to  apply  an  appropriately  modified 
version  of  Proposition  2.3  to  all  discrete  Markov  chains,  including  periodic  ones. 

We  close  this  section  by  pointing  out  that  there  is  nothing  special  about  the  coefficients  a,y 
being  integer.  For  example,  if  the  a,,  arc  rationals  we  could  introduce  another  small  parameter  6 
(to  replace  e)  and  another  set  of  integer  coefficients  /?y,  so  that  S^>  =  Even  if  the  a,/s  are 
not  rational,  neither  arc  their  ratios  rational,  the  proof  of  Proposition  2.3  remains  valid,  as  long  as 
min(atJ}>  1.  This  can  be  always  achieved  by  redefining  the  small  parameter  e. 


Proposition  2.3  allows  us  to  determine  the  structure  of  a  Markov  chain  X  6  M,  in  the  first  of 
the  slow  time  scales,  that  is  for  times  of  the  order  of  1/c.  However,  we  notice  that  the  transition 
probabilities  P(x(\ l t)  =  j  |  x(0)  =  *)  satisfy  (2.1),  (2.2),  (with  a  new  choice  of  C\,  C »,  /)  provided 
that  we  replace  a,y  by  V(i,j).  Moreover,  due  to  part  (iii)  of  Pi. position  2.2,  the  coefficients  V(i,j) 
satisfy  the  triangle  inequality  (2.3)  and,  therefore,  Proposition  2.3  becomes  applicable  once  more. 
This  yields  estimates  for  the  transition  probabilities  P(x(  1/c2)  =  j|x(0)  ==  i).  This  procedure  may 
be  repeated  to  yield  estimates  for  P(x(l/id)  =  j|x(0)  =  *),  for  any  positive  integer  d.  To  summarize, 
we  have  the  following  algorithm: 

Algorithm  H;  (Input:  A  =  {a,y:  1  <  i,j  <  N},  satisfying  (2.3);  Output:  for  each  d  £  Mo,  a 
collection  Vd  =  {Vd{t,j):  1  <  i,j  <  N}  and  a  subset  Rd  of  the  state  space.) 

1.  Let  V°(i,j)  —  a,y,  Vt,j. 

2.  Having  computed  Vd,  let  Rd  be  the  set  of  all  states  such  that  Vd(i,j)  =  0  implies  Vd(j,  t)  =  0. 
[TRd  will  denote  the  complement  of  Rd  and,  for  any  »  £  Rd,  let  Rd  =  {j  £  Rd:  Vd(i,j)  =  0}.) 

3.  Let  Vd,  Rd  be  the  input  to  Algorithm  1;  let  Vd+1  be  the  output  returned  by  Algorithm  I. 

The  remarks  preceding  Algorithm  II  establish  the  the  next  proposition.  (Notice  that  when  we 
use  Proposition  2.3  to  obtain  estimates  for  t  sas  l/ed,  the  unit  of  time  becomes  1  /td~l.  For  this 
reason,  the  variable  t  in  Proposition  2.3  must  be  replaced  by  ted~l.) 

Proposition  3.1:  Given  some  A  satisfying  (2.3)  and  some  d  £  M,  let  Vd(t,j),  Rd,  be  the  collection  of 
integers  and  the  subset  reeturned  by  Algorithm  II.  Then,  for  any  positive  constants  Ci,  Ci  and  for 
any  f  £  U,  there  exist  positive  constants  D\,  Dj,  D3,  D*  <  1  and  g  £  U,  such  that,  for  any  t  >  0 
and  for  any  Markov  chain  X  £  Mf(A,  C\,  C3,  J)  we  have 


N))ncv <  P(x(t)  =  j  j  x(0)  =  t)  <  D2evJW  +  XiDtDf"  t^'M)  +  g(e), 

Vi  €  [N/ed~l,l/ed],  (3.1) 


where  x<  =  0i  if  i  £  Rd  1  and  Xi  =  1,  otherwise.  (The  upper  bound  in  (3.1)  is  also  valid  for 
i  €  [l /td~1,N/td~1].)  In  particular,  there  exist  D\,D%  >  0,  g  £  U  such  that 


Dkv‘™  <  p,y(^)  <  DtS*™  +  g(t). 


(3.2) 


We  continue  with  a  few  remarks  on  the  quantities  computed  by  Algorithm  11. 


(ii)  For  any  d,  wc  have  Rd+t  C  Rd. 

(Hi)  V%j)+V'{j,k)  >  V'""M(i,  k),  Vi,j,k,  c,  d. 

Proof:  (i)  This  is  an  immediate  consequence  of  part  (iii)  of  Proposition  2.2. 

(ii)  Suppose  that  t  £  Rd+l.  Then,  Vd+t(i,i)  =  0.  Using  part  (ii)  of  Proposition  2.2,  we  conclude 
that  »  £  Tlld,  or,  equivalently,  t  £  Rd. 

(iii)  Using  Proposition  3.1  twice,  there  exist  constants  D\t  such  that 

Dltv'HJ)+v'U,k)  <!>(*(! +  i)  =  fc|*(0)  =  i)  <  D2ev-l-4i{i,k). 

Moreover,  this  inequality  is  true  for  all  X  £  Me  and  for  all  e  >  0.  Letting  c  be  arbitrarily  small, 
wc  conclude  that  the  claimed  result  holds.  • 

As  a  corollary  of  Proposition  3.2  we  conclude  that  some  of  the  upper  bounds  of  Proposition  3.1 
arc  true  even  for  times  smaller  than  l /td~l. 

Corollary  3.1:  If  i  £  Rd,  or  if  j  €  Rd,  or  if  Vd(i,j)  <  Ve(»,  j),  Vc  <  d,  then  there  exists  some  C  >  0 
such  that 

Pij{t)  <  Cey4(<’}),  Vt  £  [0, 1  /td],  VX  £  Me,Vc  >  0.  (3.3) 

Proof:  If  i  G  Rd,  then  Vd(i,  i)  =  0.  For  any  c  <  d,  and  for  any  j,  we  may  apply  part  (iii)  of 
Proposition  3.2  to  obtain  Vd(i,j)  <  Vd(i,i)+  Ve(i,j)=  Ve(i,j).  A  similar  argument  leads  to  the 
same  conclusion  if  j  £  Rd.  Now,  given  some  t  <  l/cd,  find  some  c  such  that  t  £  [l/e*-1,  l/ec].  We 
then  use  Proposition  3.1  to  obtain  p,y(£)  <  <  Dev*^'*\  • 

Inequality  (3.3)  is  in  general  false  if  its  assumption  fails  to  hold. 

We  continue  with  a  few  remarks  on  the  applicability  and  usefulness  of  Algorithms  [  and  II. 
Looking  back  at  Algorithm  1,  we  see  that  in  order  to  determine  V{i,j)  for  i  £  R  and  j  £  R, 
we  only  need  to  know  the  coefficients  afJ  for  i  and  j  belonging  to  R.  This  has  the  following 
implication  for  Algorithm  II:  in  order  to  compute  the  coefficients  {Vr<<+,(*,  j):  i,j  £  Rd},  wc  only 
need  to  know  the  coefficients  (FJ(i,)):  i,j  £  Rd).  Since  Rd+l  C.Rd,  it  follows  that  the  coefficients 
{Vd+1(»,  j):  i,j  £  Rd+l}  may  be  computed  from  the  coefficients  {Vd[t,j):  i,j  £  Rd}.  Thus,  if 
we  arc  only  interested  in  determining  which  states  are  recurrent  for  each  time  scale  (as  well  as  in 
determining  the  corresponding  crgodic  decomposition)  wc  may  eliminate,  at  each  stage  of  Algorithm 
II,  the  states  which  have  been  found  to  be  transient,  that  is  the  elements  of  TRd.  This  observation, 
together  with  the  fact  that  wc  only  need  to  carry  out  the  algorithm  for  just  one  representative 
from  each  class  Rd,  should  result  in  a  substantial  amount  of  savings,  were  the  algorithm  to  be 
implemented. 


13 


Algorithm  11  is  also  applicable  to  continuous  time  Markov  chains.  For  example,  let  there  be 
given  a  stationary  (for  simplicity)  Markov  chain  whose  generator  A,  is  a  |>olynomial  in  c  and  where 
c  is  an  unspecified  positive  parameter.  Then,  the  transition  probabilities,  over  a  time  interval  of 
unit  duration,  satisfy  inequalities  (2.1),  (2.2)  for  a  suitable  choice  of  ai}.  (In  fact,  the  a.y’s  may  be 
rcad-ofT  from  the  Taylor  series  expansion  of  eA‘,  or,  equivalently  by  solving  a  shortest  path  problem; 
the  details  arc  omitted.)  Moreover,  it  can  be  shown  that  these  coefficients  automatically  satisfy 
assumption  (2.3),  so  that  Propositions  2.3  and  3.1  may  be  applied  to  the  discrete  time  Markov  chain 
obtained  by  sampling  the  continuous  time  Markov  chain  at  integer  times.  Finally,  an  elementary 
argument  shows  that  the  estimates  obtained  arc  valid  for  non  integer  times  as  well. 

We  compare  Algorithm  11  and  Proposition  3.1  to  the  results  available  in  the  literature.  There  has 
been  a  substantial  amount  of  research  on  singularly  perturbed  stationary  Markov  chains  [1,2,3,4,12]. 
Typical  results  obtain  exact  asymptotic  expressions  for  the  transition  probabilities,  as  a  small 
parameter  e  converges  to  zero.  These  asymptotic  expressions  arc  obtained  recursively,  by  preceding 
from  one  time  scale  to  the  next  one,  similarly  with  Algorithm  II.  Each  step  in  this  recursion  involves 
the  solution  of  systems  of  linear  equations  and,  possibly,  the  evaluation  of  the  pseudoinverse  of  some 
matrices  [1],  which  may  be  computationally  demanding,  especially  if  we  arc  dealing  with  large  scale 
systems.  However,  we  may  conceive  of  situations  in  which  we  arc  not  so  much  interested  in  knowing 
the  values  of  the  transition  probabilities,  but  rather  we  want  to  know  which  events  are  likely  to 
occur  (over  a  certain  time  interval)  and  which  events  have  asymptotically  negligible  probability  (as  e 
goes  to  zero).  For  the  latter  ease,  a  non-numcrical,  graph-theoretic,  method  is  more  natural.  Such 
a  method  (for  stationary  Markov  chains)  is  implicit  and  easy  to  extract  from  the  results  of  [12]. 
Algorithm  II  also  accomplishes  the  same. 

On  the  more  technical  side,  it  docs  not  follow  from  the  literature,  neither  is  it  a  priori  obvious, 
that  there  exist  integer  coefficients  V<<(»,j)  such  that  inequalities  of  the  type  (3.1)  hold.  The 
existing  results  provide  approximations  for  those  transition  probabilities  which  do  not  vanish  as  e 
approaches  zero  [1,2,3,4,12]  but  much  less  is  known  about  the  asymptotic  behavior  of  the  vanishing 
transition  probabilities.  Furthermore,  the  techniques  which  arc  usually  employed  arc  tailored  to 
stationary  Markov  chains  (c.g.  perturbation  theory  of  linear  operators)  and  do  not  seem  applicable 
to  the  analysis  of  non-station  ary  chains.  The  discussion  following  Proposition  2.1  suggests  one 
method  for  applying  results  for  stationary  chains  to  non -stationary  ones  but  it  docs  not  seem  to 
be  universally  applicable.  Let  us  also  point  out  that  Proposition  3.1  is  fairly  easy  to  derive  for 
“nearly  decomposable”  Markov  chains  [3].  This  is  not  the  ease  for  more  general  Markov  chains;  in 

14 


particular,  the  existence  of  transient  states  which  feed  into  difTcrcnt  crgodic  c hisses  are  the  main 
source  of  difficulty  [12]. 


I 
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I  nw  /\|N  IN  IW\lill> 

Iii  simulated  annealing  [6, 10]  wc  arc  given  a  set  S  =  {1,...,  N}  of  states  together  with  a  cost 
function  J:S  >-*  M  to  be  minimized.  (Our  restriction  that  J  takes  integer  values  is  not  significant.) 
The  algorithm  jumps  randomly  from  one  state  to  another  and  forms  a  Markov  chain  with  the 
following  transition  probabilities: 

P(x{t+  1)  =  j\x(t)  =  i)  =  Q(i, j)cxp[niin{0, ~{J[j)  -  J{i))/T(t)}],  (4-l) 

P{x(t  +  1)  =  «'  |  x(t)  ==*')=  i  -  />(I(i  +  0  =  J  I  x(0  =  *').  (4-2) 

where  the  kernel  Q(i,j)  is  nonnegative  and  satisfies  ^  Q(i,j)  =  1  and  T’(t)  >  0  is  the  “temperature” 
at  time  t.  It  is  known  that  if  T[t)  decreases  to  zero  slowly  enough,  then  x(t)  converges  (in  probabil¬ 
ity)  to  the  set  at  which  J  is  minimized  [5-9,11].  Wc  arc  interested  in  determining  how  slowly  T[t) 
must  converge  to  lero,  so  that  convergence  to  the  minimizing  states  is  obtained.  This  issue  has  been 
resolved  by  Hajek  [9]  under  some  restrictions  on  the  structure  of  We  shall  derive  shortly  the 

answer  to  this  question  in  a  more  general  setting.  Moreover  our  method  establishes  a  connection 
between  simulated  annealing  and  the  structure  of  singularly  perturbed  stationary  Markov  chains. 

Wc  formulate  the  problem  to  be  studied  in  a  slightly  more  general  manner,  as  follows.  Suppose 
that  wc  are  given,  a  stochastic  matrix  P' ,  (whose  ij- th  entry  is  denoted  by  p[;)  parameterized  by 
a  positive  parameter  e  and  assume  that  there  exist  positive  constants  C\,  Ci  and  a  collection  A  — 

:  1  <  i,j  <  N]  such  that  c«i3  £  A/0U{oo},  Vt,j  and  such  that  =  0,  whenever  a,y  =  ao  and 
Cyta'’  <  p\j  <  Ve  £  (0,1],  whenever  <  oo.  Finally,  we  arc  given  a  function  (cooling 

schedule)  c.Mq  >-»  (0,  1).  We  are  interested  in  the  Markov  chain  x(t)  with  transition  probabilities 
given  by  P(x(t  +  1)  =  j  |  x(<)  =  i)  =  p-j1’. 

Clearly,  the  simulated  annealing  algorithm  is  of  the  type  described  in  the  preceding  paragraph, 
provided  that  we  identify  c(t)  with  and  provided  that  wc  define  a ,3  =  oo,  if  Q(*,j)  =  0, 

iy^j,  and  a ,-3  =  max{0,  J[j)  —  ./(*)},  if  Q(},j)j ^  0,  ij^j.  Also,  an  has  to  be  accordingly  defined. 

Wc  now  return  to  our  general  formulation.  Wc  thus  assume  that  A,  C i,  Cj  are  given,  together 
with  the  schedule  (e(t)}.  We  assume  that  A  satisfies  (2.3)  and  wc  define,  for  any  d  £  A/o,  the 
quantities  and  the  sets  Rd  by  means  of  Algorithm  II  of  Section  III.  Our  main  result  is  the 

following. 

Proposition  -1.1:  Assume  that  for  some  integer  d>  0, 

jt,  ^(0  —  00  >  (4-3) 

t=o 
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£y+'(t)<oo.  (4.4) 

«=o 

Then, 

(i)  limt—oc  P{x(t)  G  Rd  1 1(0)  =  *)  =  1,  V*. 

(ii)  For  any  »  G  Itd,  Iim  sup^^  P(x(t)  =  » 1 x(0)  =  i)  >  0. 

Proof:  The  main  idea  of  the  proof  is  to  partition  [0,  oo)  into  a  set  of  disjoint  time  intervals  [t*,  tk+i) 
such  that  x(t)  is  approximately  stationary  during  each  such  interval,  in  the  sense  of  Section  II,  and 
then  use  the  estimates  available  for  such  Markov  chains. 

The  proof  for  the  ease  d  =  0  is  rather  easy  and  is  omitted.  We  present  the  comparatively 
harder  proof  for  the  case  d>  1. 

We  start  with  the  proof  of  part  (i)  of  the  proposition.  We  define  to  =  0  and 


tk+1  tk  +  td-l(tky  lf  e(**  +  c*-»(«k))“«' 


<jk+j  =  max{<:  e(t)>  -<(<*)},  otherwise. 

mt 


(tk), 


(4.5) 

(4.6) 


(If  t*+ 1  as  defined  above  turns  out  to  be  non-integer,  we  may  assume  that  it  is  truncated  to  the 
first  integer  below  it.)  We  define  A t  (respectively,  /is)  aa  the  set  of  all  k’a  such  that  t/t+i  is  defined 
by  (4.5)  (respectively,  (4.6)).  We  will  need  the  following  properties  of  the  sequence  (t(t*)}. 


<  '(*0.  Vt€[«*,*fc+i] 

X)  e(**) “  °°> 

*€At 

X]  e2(**)  < 00 
*=0 

Let  f(k,l)  be  the  cardinality  of  /t/,n{f, ...,  k  —  1},  for  k>l.  Then,  for  any  C  E  (0, 1), 

<°o 

k=0  1=0 
* 


Km  y)(l  -  C)/(M)c(t,)  =  0,  Vc  6(0,1). 

k— oo  — T 


(4.7) 

(4.8) 

(4.9) 

(1.10) 

(4.11) 


1=0 


Proof:  Inequalities  (4.7)  are  an  immediate  consequence  of  (4.5),  (4.6). 

We  notice  that  for  any  k  G  /is.  k'  G  As,  with  k'  >  k,  we  have  c (tk>)  <  (l/2)t(f*).  Hence, 


c(*o  ^  c(°)  2~k  <  °°- 


(4.12) 


Finally, 


yi  ‘(4*)  =  £  -  **]  =*  £<•  J(f  *)[**+ 1_<*]  -  £  c<l(**)[*fc-n  ~  **]> 

*6/U  keAL  *=0  *S^s 

£  ‘-(*j  -  £  = 00 - 

t=0  k£A, 

which  proves  (4.8). 

From  (4.12)  we  conclude  that  ^2keAs  c2(<t)  <  oo.  Also, 

£  =  £  i-+,(u)l‘*+i  -  ‘*1  <  £  ^'(0  <  °°> 

teAt  *=o  t=o 

which  proves  (4.9). 

Given  any  C  £  (0, 1),  we  define  a  constant  a  by  [2(1  —  C)]a  =  3/2,  if  2(1  —  C)>  1;  otherwise, 
we  let  gsl.  Let  B  =  {(k,l):  k>l  and  f(k,l)>a(k  —  l)}.  Then, 

oo  h 

£  (i  -  f7)/(*,,)«(i*)c(t/)  <  ££[(i-cnk-'<(W.)<°°. 

(*,0ea  *=o  t=o 

because  (1— C)“  <  1  and  c(fc)  is  square  suimnablc,  by  (4.9).  Now  notice  that  c(tt)  <  2 ” (*“0+/(**0e(t|)| 
if  fc>/.  Hence, 


£  (i  -  <?)',MMt*M*,)  <  £  [2(1  -  c))'(*''>2-<*-V(t,)  < 

(k,l)tB,k^l  (k,l)ta,k^l 

£(3/2)‘-,(l/2 )*-*«*(*,)  <  oo, 

which  proves  (4.10).  The  proof  of  (4.11)  is  similar  and  is  omitted.  • 

We  now  define 

S0  =  Rd  =  {is  if  V*(i,i)  =  Othcn  Vd(j,»)  =  0}, 

S„+i  =  {*  €  Rd~l:  i  £  50U...U5„  and  3 j  £  Sn  such  that  Vd~l(i,j)  =  1}, 

T0  =  {*  €  TRd~l:  3 j  £  S0  such  that  Fd"l(j,i)  =  1> 

and  we  let  Ti  be  the  complement  of  To  in  TRd~l.  Notice  that  (U„^  o*9n)UToUTi  =  {l,...,Af}. 
Also,  if  i  €  Sn,  ny^  0  and  Vd~l{i,j)  —  0,  then  j  €  Rf~l  and  j  €  Sn.  (For  a  proof  of  this  fact, 
if  *  £  Sn,  then  t  £  7?d_1;  so,  if  Vd~l(i,j)  =  0,  then  Vd~l(j,i)  =  0  and  therefore  j  £  Rd~l.  Let 
l  €  S„_!  be  such  that  Vd-I(»,i)  =  1.  Then,  Vd~l(j,l)  =  1.  So,  either  j  £  Sn  and  wc  are  done, 


or  j  £  50U...US„_|.  In  the  second  ease,  the  same  argument  shows  that  i  £  i9oU---U<Sn-i  which  is  a 
contradiction.) 

We  let  y(ifc)  =  x(tk).  We  need  estimates  on  the  transition  probabilities  of  the  y(k)  process. 
These  arc  obtained  by  noting  that,  for  any  k,  the  Markov  chain  {z(<):  t  6  [<*,<*+1]}  belongs  to 
■Mc(t*)(^»2-KCi,Ci{,0),  where  K  =  max{a, j-otij  <  00}.  Since  tk+l  - tk  <  l/(td-1  (tk)),  Corollary 
3.1  may  be  used  to  obtain  upper  bounds.  Also,  for  k  G  A/,,  tk+ 1  —  tk  =  1  /(erf—  *(^fc))  an<I  therefore 


Proposition  3. 1  may  be  used  to  obtain  lower  bounds.  In  more  detail,  we  have: 
l.cmma  4.2:  There  are  constants  F  >  0,  G  >  0,  such  that,  for  every  k  £  U 0  we  have 

(i)  ir  k  £  A,.,  then  P(y(k  +  1)  G  Sn  \  y(k)  £  5„tI)  >  Fc(tk),  Vn.  (4.13) 

(ii)  P(y(k  +  1)  £  Sn  |  y(k)  £  Sn)  <  Gt(tk),  Vn.  (4.14) 

(hi)  r(y(k  +  1)  g  50ur0  |  y(k)  £  S0)  <  Ge2(tk).  (4.15) 

(iv)  l’(y(k  +l){?  SoU'/’o  I  y(k)  €  T0)  <  Gc(tk).  (4.16) 

(v)  P(y(k  +  1)  €  To  |  y(k)  £  S0)  <  Gc(tk).  (4.17) 

(vi)  If  k  £  Al,  then  P{y{k  +  1)  £  So  |  y(k)  €  To)  >F.  (4. 18) 

(vii)  If  k  £  Al,  then,  for  all  t,  P(y(k  +  1)  £  T Rd~l  |  y(k)  =  t)  <  1  —  F.  (4.19) 


Proof:  (i)  If »  £  Sn+i,  then  (by  definition)  there  is  some  j  £  Sn  such  that  K11''^,;)  =  1.  The  result 
follows  from  the  lower  bound  in  (3.2). 

(ii)  Let  *  £  S„,  j  £  5n.  We  have  shown  earlier  that  we  must  have  Vd~l[i,j)>  1  and  the  result 
follows  from  (3.3). 

(iii)  Let  i  £  S0  and  j  £  S0uT0.  If  j  £  Sn,  n^=  0,  then  j  £  Rd;  hence  Vd(i,j)>  1.  Therefore,  using 
the  definition  of  Vd,  we  have  1  <  Vd[i,j)  <  Vd(*,i)-t-  Vd~l(i,j)  —  1  =  Vrf— 1  (*, jf)  —  1.  Hence 
Vd~l(i,j)>  2.  Finally,  if  j  £  T\,  then  Kd-,(t,y)>  2,  because  otherwise  we  would  havcj  £  To-  The 
result  follows  from  (3.3). 

(iv)  Let »  6  To  and  j  £  S0UT0  Let  us  also  choose  some  l  £  So  such  that  Vd~l(l,i)  =  1  (which  exists 

by  the  definition  of  To).  If  j  G  S„,  0,  then  1  (*,  j) >  1,  because  otherwise  Vd~l(l,j)  =  1, 

which  contradicts  the  discussion  in  the  proof  of  part  (iii).  So,  for  this  ease  the  result  follows  from 
(3.3).  Suppose  now  that  j  £  T\.  For  any  c  <  d  —  1  we  must  have  Vc(i,j)>  1  because  otherwise 
(using  Proposition  3.2)  Vd~l{l,j)  <  Vd~l(l,i)  +  Ve[i,j)  =  1,  which  contradicts  the  assumption 
j  £T\.  The  result  follows  again  from  (3.3). 

(v)  This  is  immediate  from  Vd~l(i,j)>  1,  V*  £  Rd~l ,  Vj  £  TRd~x  (Proposition  2.2,  part  (ii)). 

(vi)  Let  i  £  Tq-  Since  »  G  TRd~l,  there  exists  some  j  £  Rd~l  such  that  Vd~l(i,j)  =  0.  By  the 
previous  discussion,  such  a  j  cannot  belong  to  Sn,  for  n>  1.  The  result  follows  from  (3.2). 


(vii)  Similarly,  for  any  t  there  exists  some  j  £  Rd~l  such  that  Vd~l[i,j)  =  0  and  the  result  follows 
from  (3.2).  • 

Let 

// *  =  r(y(n)  €  SoUT0,  0  <  n  <  fc|«(0)  €  So), 

Qk  =  P(y(k)  €  T0  |  y(n)  €  SoUT0,  0  <  n  <  k  —  I,  j/(0)  £  So). 

Using  (4.17),  (4.18),  we  obtain 

Qk+l  <  Ct(tfc)  +  (1  -  XkF)Qk, 

where  x*  —  l  if  fc  £  and  x*  —  0,  otherwise.  So, 

k 

Qk  <  F)^k’l\ 

1=0 

Using  (4.15),  (4.16), 

ffk+ 1  >  [1  -  Cc(tk)Qk  -  Cr2(tt)]//*  (4-20) 

Now,  t(tk)Qk  is  summablc,  by  (4.10);  also,  i2(tk)  is  summablo,  by  (4.9).  lienee  liminfk_00  Hk  >  0. 
More  intuitively,  once  the  state  enters  So,  there  is  positive  probability  that  it  never  leaves  SoUT0. 
Consequently,  the  total  flow  of  probability  into  Sq  from  Si  must  be  finite.  Hence,  using  (4.13),  we 
have 

OO 

£  ^k)P(y(k)  €  5,)  <  00. 

k= 0 

We  will  prove  by  induction  that  for  all  n>  l, 

OO 

£  *(t| k)T(y(fc)  e  Sn)  <  00.  (4.21) 

k=0 

Using  (4.13),  (4.14),  we  have 

r(y(k  +  I)  £  Sn)  >  P(y(k)  £  S„)  -  Gt(tk)P(y(k)  £  Sn)  +  XkFc(tk)P(y(k)  £  S„+1).  (4.22) 

By  telescoping  the  inequality  (4.22)  and  using  the  induction  hypothesis  (4.21),  we  see  that 
E7Lo  Xkt(tk)P(y(k)  £  £„+,)  <  oo.  Also,  £(f*)^(»(^)  €  Sn  +  1)  <  EkSAs  e((k)  <  oo 

(because  of  (4.12))  which  completes  the  induction  step.  Using  (4.21)  and  the  fact  that  i(tk)  sums  to 
infinity  we  conclude  that  limsupi_oc  P[y(k)  £  SqUT Rd~l)  =  1.  We  show  next  that  the  probability 


of  transient  states  goes  to  zero.  Inequalities  (4.14)  and  (4.19)  imply 


P(y(k  +  1)  €  TRd~l)  <  Gc(tk)  +  (1  -  XkF)P(y(k)  G  TRd~l). 

Thus, 

k 

P{y(k  +  1)  €  TRd~')  <  (1  -  F)f{k'0)  +  G  ^(1  -  P)Hk’lK(t,), 

i=u 

which  converges  to  0,  as  A;  tends  to  infinity,  due  to  (4.11).  We  may  thus  conclude  that  lim  supk_00  P{y(k)  G 
So)  =  1.  By  repeating  the  argument  that  led  to  (4.20)  we  can  see  that  the  probability  that  y  ever 
exits  SoUTo,  given  that  y(k)  G  So,  converges  to  zero,  os  k— *oo.  (This  is  a  consequence  of  the  square 
summability  of  c(i*).)  It  follows  that  lim*-.,*,  P(y(k)  G  So)  =  1.  Finally,  for  any  t  G  we 

have  P(x(t)  G  So)>  P(y(k)  G  So)  —  Gt(tk),  which  converges  to  1,  as  k—* oo.  This  completes  the  proof 
of  part  (i)  of  the  proposition. 

For  part  (ii)  of  the  proposition,  in  order  to  avoid  introducing  new  notation,  we  prove  the 
equivalent  statement  that  if  t*(t)  <  °°>  lhcn  lim  supt-.^, /J(i(t)  =  1 1  x(0)  —  i)  >  0,  V»  G 

Rd~l.  So,  let  t  G  Rd~l  and  consider  the  set  Rd~l.  For  any  j  g  Rd~l,  we  have  Vd~l(x,j)>  0  and, 
therefore,  (using  Corollary  3.1),  there  exists  some  G  >  0  such  that 

P(y(k  +  1)  £  Rd~'  |  y(k)  G  Rf~l)  <  Gc(tk),  Vfc. 

Since  we  are  assuming  that  £<<(*)  <  °°>  follows  (as  in  the  proof  of  (4.9)),  that  £(**)  < 

oo.  Consequently, 

inf  P(y(k)  €  Rtl  |  y(0)  =  »)  >  0.  (4.23) 

Finally,  for  any  j  G  Rd~l  we  have  Vd~l(j,i)  =  0.  Hence,  using  Proposition  3.1,  there  exists  some 
F  >  0  such  that 

P(y(tk+i)  =  i I  y(tk)  G  «?-*)  >  F.  (4.24) 

By  combining  (4.23),  (4.24),  we  obtain  the  desired  result.  • 

Corollary  4.1:  Let  the  transition  probabilities  for  the  simulated  annealing  algorithm  be  given  by 
(4  1),  (4.2).  Consider  cooling  schedules  of  the  form  T(t)  =  c/logt.  The  smallest  constant  c  such 
that,  for  any  initial  state,  the  algorithm  converges  (in  probability)  to  the  set  of  global  minima  of  J, 
equals  the  smallest  d  such  that  the  set  of  global  minima  contains  Rd. 

Proof:  Let  d  be  the  smallest  such  d.  Having  identified  cxp[— l/T(t)]  with  c(t),  we  sec  that  the  algo¬ 
rithm  converges  appropriately  if  and  only  if  exp[— d‘  log  t/c]  =  oo.  Equivalently,  t^~d  = 

oo,  which  is  equivalent  to  d*  <  c.  • 
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Proposition  4.1  can  be  applied  to  any  continuous  time  simulated  annealing  algorithm,  because  in 
that  ease  we  may  sample  the  Markov  chain  at  integer  times  and  condition  (2.3)  will  be  automatically 
true.  For  discrete  time  algorithms,  even  if  (2.3)  fails,  the  result  is  still  valid  for  any  structure  A 
such  that  the  estimates  (2.8)  of  Proposition  2.3  arc  true  (with  an  appropriate  choice  of  V(i,j)).  We 
have  seen  in  Section  II  that  this  is  the  case  for  a  much  broader  class  of  Markov  chains.  In  fact,  we 
conjecture  that  Proposition  4.1  is  always  true,  provided  that  the  sets  Rd  arc  correctly  defined. 

Another  possibility  for  generalizing  Proposition  4.1  coinc3  by  allowing  the  schedule  c(t)  to  be 
non-inonotonic.  In  fact  the  proof  goes  through  (with  a  minor  modification  in  the  definition  of  the 
sequence  {**})  if  we  only  assume  that  there  exists  some  C  >  0  such  that  t(t)  <  Cc(a),  Vt~>a,  which 
allows  for  mild  non-monotinicity.  On  the  other  hand,  if  i(t)  is  allowed  to  have  more  substantial 
variations,  then  the  conclusions  of  Proposition  4.1  arc  no  more  true.  For  a  simple  example  consider 
the  Markov  chain  or  Figure  1,  together  with  the  schedule  c(t)  =  t~ 1^2,  if  t  is  even,  and  c(t)  =  1/t, 
if  t  is  odd.  For  this  schedule,  the  largest  integer  for  which  St’Lo  £<<(0  =  00  i3  equal  to  2.  Also, 
R 2  =  {3}.  On  the  other  hand,  R[x(t)  =  3  |  x(0)  =  1)  does  not  converge  to  1. 

We  have  claimed  that  our  result  generalizes  the  results  of  [9]  and  we  end  the  paper  by  supporting 
this  claim.  Hajek’s  result  characterized  d*  in  an  explicit  manner,  as  the  maximum  depth1  of  local 
minima  which  are  not  global  minima,  under  a  “weak  reversibility”  assumption,  which  is  equivalent 
to  imposing  certain  restrictions  on  the  structure  A.  Our  characterization  is  less  explicit  because 
instead  of  describing  d‘  we  give  an  algorithm  for  computing  it  in  terms  of  A.  Nevertheless,  for  the 
class  of  structures  A  considered  in  [9],  we  can  use  our  Algorithm  II  to  show  that  Rd  is  the  set  of 
all  local  minima  of  the  cost  function  J,  of  depth  d  +  I,  or  more.  Hence,  the  d‘  produced  by  our 
algorithm  is  the  smallest  d  such  that  all  local  (but  not  global)  minima  have  depth  d  or  less,  which 
agrees  with  the  result  of  [9],  We  do  not  present  the  details  of  this  argument  since  it  would  amount 
to  redcriving  a  known  result. 


1.  The  depth  of  a  state  t  is  defined  as  the  minimum  over  all  j,  such  that  J(j)  <  J{i),  of  the 
minimum  over  all  paths  leading  from  i  to  j,  of  the  maximum  of  J(k)  —  ./(»),  over  all  fc’s  belonging 
to  that  path;  the  depth  of  t  is  infinite  if  no  such  j  exists. 
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